Contextual Document Clustering
نویسندگان
چکیده
In this paper we present a novel algorithm for document clustering. This approach is based on distributional clustering where subject related words, which have a narrow context, are identified to form metatags for that subject. These contextual words form the basis for creating thematic clusters of documents. We believe that this approach will be invaluable in creating an information retrieval system that will allow for a more personalized and intelligent search and retrieval mechanism. In a similar fashion to other research papers on document clustering, we analyze the quality of this approach with respect to document categorization problems and show it to outperform the information theoretic method of sequential information bottleneck.
منابع مشابه
Contextual Abstraction Based Clustering Technique for Effective Text Document Mining
Document clustering is considered to be the essential process in grouping the unsupervised documents for effectual applications in text mining and information retrieval. Recently, many research works has been developed for text document clustering. However, performance of clustering the text document is not effective. In order to overcome such limitation, a novel Contextual Abstraction based Do...
متن کاملDocument Clustering with Explicit Semantic Analysis (ESA)
Document clustering recently became a vital approach as numbers of documents on web and on proprietary repositories are increased in unprecedented manner. The documents that are written in human language generally contain some context and usage of words mainly dependent upon the same context; recently researchers have attempted to enrich document representation via external knowledge base. This...
متن کاملA Dynamic Programming Approach To Document Clustering Based On Term Sequence Alignment
Document clustering is unsupervised machine learning technique that, when provided with a large document corpus, automatically sub-divides it into meaningful smaller sub-collections called clusters. Currently, document clustering algorithms use sequence of words (terms) to compactly represent documents and define a similarity function based on the sequences. We believe that the word sequence is...
متن کاملAn Exploration into the Use of Contextual Document Clustering for Cluster Sentiment Analysis
In this paper we consider whether the thematic document clustering approach of Contextual Document Clustering is able to capture the overall sentiment of a cluster of documents. We provide a novel mechanism to determine the sentiment of a cluster based on the latter approach and assess the approach on three data sets formed from the NY Times annotated corpus. We demonstrate that CDC does provid...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004